[SPARK-9240] [SQL] Hybrid aggregate operator using unsafe row#7813
[SPARK-9240] [SQL] Hybrid aggregate operator using unsafe row#7813yhuai wants to merge 19 commits intoapache:masterfrom yhuai:AggregateOperator
Conversation
|
Test build #39133 has finished for PR 7813 at commit
|
|
Test build #39171 has finished for PR 7813 at commit
|
|
Seems even if we use a single element of an array or a struct (e.g. |
|
The failed query is |
|
Test build #39240 has finished for PR 7813 at commit
|
…itial version of the hybrid iterator.
…or of the iterato will read at least one row from a non-empty input iter.
|
Test build #39480 has finished for PR 7813 at commit
|
|
Test build #39481 has finished for PR 7813 at commit
|
There was a problem hiding this comment.
why do we want a config flag here?
|
Test build #39497 has finished for PR 7813 at commit
|
|
Test build #39498 has finished for PR 7813 at commit
|
Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
|
Test build #39502 has finished for PR 7813 at commit
|
There was a problem hiding this comment.
this is an error case -- we should throw IllegalStateException, and make it clear that if we hit this path, it's a bug.
Right now it sounds as if this operator just cannot handle a legitimate case.
|
Test build #39509 has finished for PR 7813 at commit
|
|
Test build #39517 has finished for PR 7813 at commit
|
|
I'm going to merge this. I think this needs more refactoring, but we can do those as follow-ups. |
There was a problem hiding this comment.
we should just remove this function and inline it. We don't want an extra iterator overhead to process the rows.
Each iterator actually adds a lot of overhead, and here it doesn't buy you any code reduction (on the contrary it increases complexity due to the extra abstraction).
…w up) This is the followup of #7813. It renames `HybridUnsafeAggregationIterator` to `TungstenAggregationIterator` and makes it only work with `UnsafeRow`. Also, I add a `TungstenAggregate` that uses `TungstenAggregationIterator` and make `SortBasedAggregate` (renamed from `SortBasedAggregate`) only works with `SafeRow`. Author: Yin Huai <yhuai@databricks.com> Closes #7954 from yhuai/agg-followUp and squashes the following commits: 4d2f4fc [Yin Huai] Add comments and free map. 0d7ddb9 [Yin Huai] Add TungstenAggregationQueryWithControlledFallbackSuite to test fall back process. 91d69c2 [Yin Huai] Rename UnsafeHybridAggregationIterator to TungstenAggregateIteraotr and make it only work with UnsafeRow.
…w up) This is the followup of #7813. It renames `HybridUnsafeAggregationIterator` to `TungstenAggregationIterator` and makes it only work with `UnsafeRow`. Also, I add a `TungstenAggregate` that uses `TungstenAggregationIterator` and make `SortBasedAggregate` (renamed from `SortBasedAggregate`) only works with `SafeRow`. Author: Yin Huai <yhuai@databricks.com> Closes #7954 from yhuai/agg-followUp and squashes the following commits: 4d2f4fc [Yin Huai] Add comments and free map. 0d7ddb9 [Yin Huai] Add TungstenAggregationQueryWithControlledFallbackSuite to test fall back process. 91d69c2 [Yin Huai] Rename UnsafeHybridAggregationIterator to TungstenAggregateIteraotr and make it only work with UnsafeRow. (cherry picked from commit 3504bf3) Signed-off-by: Reynold Xin <rxin@databricks.com>
…w up) This is the followup of apache/spark#7813. It renames `HybridUnsafeAggregationIterator` to `TungstenAggregationIterator` and makes it only work with `UnsafeRow`. Also, I add a `TungstenAggregate` that uses `TungstenAggregationIterator` and make `SortBasedAggregate` (renamed from `SortBasedAggregate`) only works with `SafeRow`. Author: Yin Huai <yhuai@databricks.com> Closes #7954 from yhuai/agg-followUp and squashes the following commits: 4d2f4fc [Yin Huai] Add comments and free map. 0d7ddb9 [Yin Huai] Add TungstenAggregationQueryWithControlledFallbackSuite to test fall back process. 91d69c2 [Yin Huai] Rename UnsafeHybridAggregationIterator to TungstenAggregateIteraotr and make it only work with UnsafeRow.
This PR adds a base aggregation iterator
AggregationIterator, which is used to createSortBasedAggregationIterator(for sort-based aggregation) andUnsafeHybridAggregationIterator(first it tries hash-based aggregation and falls back to the sort-based aggregation (using external sorter) if we cannot allocate memory for the map). With these two iterators, we will not need existing iterators and I am removing those. Also, we can use a single physicalAggregateoperator and it internally determines what iterators to used.https://issues.apache.org/jira/browse/SPARK-9240